-
-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP+ENH: Add REMAP action for automatically remapping (instance) UIDs #203
Conversation
Maintains study / series relationships since the same input UID always produces the same output UID, while removing any potential PHI (output is generated with cryptograhically secure hash function).
Hey thanks for the PR! So this seems very hard coded for your specific use case (which is okay) but I want to challenge you to think of this more generally. E.g., what you really are doing is running a custom function given a specific VR value. E.g.,:
So why is this not just a replace with a custom function? The function is handed the field so you'd just do that check within the function and act appropriately. Is there a reason this approach will not work? |
E.g.:
def remap_uid(item, value, field):
"""Remap existing UID in stable and secure manner
Same input UID creates the same output UID, which keeps study / series
associations correct (or even FrameOfReferenceUID) provided the full
study / series in anonymized. At same time input UID can't be recovered
from output so any potential PHI (i.e. dates / times) is eliminated.
"""
if field.element.VR == 'UI':
new_val = remap_uid(field.element)
if isinstance(field.value, list):
# Handle VM > 1
return [generate_uid(entropy_srcs=[x]) for x in field.value]
else:
return generate_uid(entropy_srcs=[field.value])
# default to return the new value
return new_val |
I agree this could just be a separate function rather than a dedicated "action". The line is a little blurry, JITTER could just be a function too, right? Does the 'deid' package provide a library of useful functions like this that could then be reference in the default recipe? I think this remapping behavior for UIDs is a much better default than stripping them as the latter produces essentially "broken" DICOM files. |
The difference is that JITTER is an explicit but general action, and one that is generally requested across this process so the dates aren't the same. This
But I love this idea! Indeed it would be really useful to provide custom functions, and make it easy to contribute and then use! If you want to talk about a design to support that (and then make it easier for the user to use as opposed to needing to roll their own as you have) I'd definitely be open to this new addition. Perhaps the current deid/dicom/actions.py could be renamed to a module folder, e.g.,: deid/dicom/actions/
__init__.py
jitter.py <- the current jitter timestamp function and deid/dicom/actions/
__init__.py
jitter.py <- the current jitter timestamp function
uids.py Into which we can put the documentation example, along with your function here! And then for usage:
and all the custom functions would be imported into init.py and thus accessible via Let me know your thoughts! That's just a quick idea for a design. |
I agree, that is a better approach and will make it much easier to add more custom functions in the future. I will close this and open a different PR. |
Awesome! Looking forward to seeing it - please ping me if you want to have any discussion, etc. |
This is a wonderful idea! |
Description
This PR improves (instance) UID handling by allowing them to be automatically remapped through a new "REMAP" action. The new UIDs are created in a way that strips all PHI (usually timestamps embedded into the UID) while preserving the study/series hierarchy (defined by the UIDs) in the anonymized output. This allows the anonymized data to be(for example) uploaded to a PACS or other network entity.
Checklist
Open questions
I also updated the default recipe to handle UIDs in this way, if you prefer I don't do that (at least for now) let me know.